Efficient Ad-hoc Approximate Query Processing in Peer-to-Peer Databases
نویسندگان
چکیده
1 This paper has appeared in The 22 International Conference on Data Engineering (ICDE) Atlanta, Georgia 2006. ABSTRACT Peer-to-peer databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large scale, ad-hoc analysis queries – e.g., aggregation queries – on these databases poses unique challenges. Exact solutions can be time consuming and difficult to implement given the distributed and dynamic nature of peer-to-peer databases. In this paper we present novel sampling-based techniques for approximate answering of ad-hoc aggregation queries in such databases. Computing a high-quality random sample of the database efficiently in the P2P environment is complicated due to several factors – the data is distributed (usually in uneven quantities) across many peers, within each peer the data is often highly correlated, and moreover, even collecting a random sample of the peers is difficult to accomplish. To counter these problems, we have developed an adaptive two-phase sampling approach, based on random walks of the P2P graph as well as block-level sampling techniques. We present extensive experimental evaluations to demonstrate the feasibility of our proposed solution.
منابع مشابه
Partial read from peer-to-peer databases
In this paper we propose Scoop, a mechanism to implement the “partial read operation” for peer-to-peer databases. A peer-to-peer database is a database that its relations are horizontally fragmented and distributed among the nodes of a peer-to-peer network. The partial read operation is a data retrieval operation required for approximate query processing in peer-to-peer databases. A partial rea...
متن کاملAn Efficient Hybrid Algorithm to Reduce Latency in Ad-Hoc Aggregation
A data warehouse is a collection of data gathered and organized so that it can easily be analyzed, extracted, synthesized and also be used for the purpose of further understanding data. Peer to Peer networks are used for distribution and sharing of documents. In traditional techniques, when aggregate functions like average, sum and count are encountered, the aggregate operation is performed by ...
متن کاملA Gnutella-based P2P System Using Cross-Layer Design for MANET
It is expected that ubiquitous era will come soon. A ubiquitous environment has features like peer-to-peer and nomadic environments. Such features can be represented by peer-to-peer systems and mobile ad-hoc networks (MANETs). The features of P2P systems and MANETs are similar, appealing for implementing P2P systems in MANET environment. It has been shown that, however, the performance of the P...
متن کاملContext-Aware Query Processing in Ad-Hoc Environments of Peers
In this article, we deal with context-aware query processing in ad-hoc peer-to-peer networks. Each peer in such an environment has a database over which users execute queries. This database involves (a) relations which are locally stored and (b) virtual relations, all the tuples of which are collected from peers that are present in the network at the time when a query is posed. The objective of...
متن کاملAn Enhanced Searching Algorithm over Unstructured Mobile P2P Overlay Networks
To discover objects of interest in unstructured peer-to-peer networks, the peers rely on flooding query messages which create incredible network traffic. This article evaluates the performance of an unstructured Gnutella-like protocol over mobile ad-hoc networks and proposes modifications to improve its performance. This paper offers an enhanced mechanism for an unstructured Gnutella-like netwo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006